-
Notifications
You must be signed in to change notification settings - Fork 47
[alpaka] Refactor prefixScan implementation #220
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
[alpaka] Refactor prefixScan implementation #220
Conversation
20130b8 to
fb7bd6f
Compare
|
Fixed conflicts and applied code formatting. |
be2894f to
d427564
Compare
|
Rebased and fixed conflicts. |
d427564 to
f8a75ea
Compare
|
Rebased and fixed conflicts. |
makortel
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general looks ok.
|
On Cori (with CUDA 11.2) I got the following failure when running I'm really puzzled what |
|
Here is a stack trace of the exception |
f8a75ea to
3ba0e0d
Compare
|
Fixed conflicts, rebased, etc. |
3ba0e0d to
63cae86
Compare
|
While the validation is good, now I see a small but systematic loss in performance. Before:After:So 2-3% slower. |
The
prefixScanalgorithm is implemented in Alpaka using two kernels, while a single kernel is used for Native CUDA.I refactored the
prefixScanimplementation in order to use a single kernel (similar with the Native CUDA implementation).